Picture for Tong Yu

Tong Yu

Sam

F-GRPO: Factorized Group-Relative Policy Optimization for Unified Candidate Generation and Ranking

Add code
May 13, 2026
Viaarxiv icon

MASS-DPO: Multi-negative Active Sample Selection for Direct Policy Optimization

Add code
May 11, 2026
Viaarxiv icon

FERA: Uncertainty-Aware Federated Reasoning for Large Language Models

Add code
May 11, 2026
Viaarxiv icon

Skill-R1: Agent Skill Evolution via Reinforcement Learning

Add code
May 10, 2026
Viaarxiv icon

A Survey on LLM-based Conversational User Simulation

Add code
Apr 27, 2026
Viaarxiv icon

WS-GRPO: Weakly-Supervised Group-Relative Policy Optimization for Rollout-Efficient Reasoning

Add code
Feb 19, 2026
Viaarxiv icon

AMPS: Adaptive Modality Preference Steering via Functional Entropy

Add code
Feb 13, 2026
Viaarxiv icon

ThinkRouter: Efficient Reasoning via Routing Thinking between Latent and Discrete Spaces

Add code
Feb 12, 2026
Viaarxiv icon

Layer-adaptive Expert Pruning for Pre-Training of Mixture-of-Experts Large Language Models

Add code
Jan 20, 2026
Viaarxiv icon

Yuan3.0 Flash: An Open Multimodal Large Language Model for Enterprise Applications

Add code
Jan 05, 2026
Viaarxiv icon